Note: I worked with Cayce Hook and MH Tessler on this problem set.
This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.
Note, that this example is from the_grammar.R on http://had.co.nz/ggplot2 I’ve adapted this for psych 254 purposes
First install and load the package.
#install.packages("ggplot2")
library(ggplot2)
setwd("~/Documents/STANFORD PHD/Psych 254 Winter 2014/ProblemSets/analyses")
Now we’re going to use qplot. qplot is the easy interface, meant to replace plot. You can give it simple qplot(x,y) examples, or slightly more complex examples like qplot(x, y, col=grp, data=d).
We’re going to be using the diamonds dataset. This is a set of measurements of diamonds, along with their price etc.
head(diamonds)
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
qplot(diamonds$carat, diamonds$price)
Scatter plots are trivial, and easy to add features to. Modify this plot so that it uses the dataframe rather than working from variables in the general namespace (good to get away from retyping diamonds$ every time you reference a variable).
qplot(carat, price, data = diamonds)
Try adding clarity and cut, using shape and color as your visual variables.
qplot(carat, price, color = clarity, shape = cut, data = diamonds)
One of the primary benefits of ggplot2 is the use of facets - also known as small multiples in the Tufte vocabulary. That last plot was probably hard to read. Facets could make it better. Try adding a facets = x ~ y argument. x ~ y means row facets are by x, column facets by y.
qplot(carat, price, color = clarity, shape = cut, facets = clarity ~ cut, data = diamonds)
But facets can also get overwhelming. Try to strike a good balance between color, shape, and faceting.
HINT: facets = . ~ x puts x on the columns, but facets = ~ x (no dot) wraps the facets. These are underlying calls to different functions, facet_wrap (no dot) and facet_grid (two arguments).
qplot(carat, price, color = clarity, facets = . ~ cut, data = diamonds)
The basic unit of a ggplot plot is a “geom” - a mapping between data (via an “aesthetic”) and a particular geometric configuration on coordinate axes.
Let’s try some other geoms and manipulate their parameters. First, try a histogram (geom="hist").
qplot(carat, fill = clarity, data = diamonds)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#I tried setting geom to "hist", but it returned the following error: "No geom called hist", so I called qplot with only the x-variable defined.
Now facet your histogram by clarity and cut.
qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
I like a slightly cleaner look to my plots. Luckily, ggplot allows you to add “themes” to your plots. Try doing the same plot but adding + theme_bw() or + theme_classic(). Different themes work better for different applications, in my experience.
#Below I try `theme_bw()` on the histogram of carats
qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds) +
theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
#Next I try `theme_classic()` on the histogram of carats
qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds) +
theme_classic()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
ggplot is just a way of building qplot calls up more systematically. It’s sometimes easier to use and sometimes a bit more complicated. What I want to show off here is the functionality of being able to build up complex plots with multiple elements. You can actually do this using qplot pretty easily, but there are a few things that are hard to do.
ggplot is the basic call, where you specify A) a dataframe and B) an aesthetic mapping from variables in the plot space to variables in the dataset.
d <- ggplot(diamonds, aes(x=carat, y=price)) # first you set the aesthetic and dataset
d + geom_point() # then you add geoms
d + geom_point(aes(colour = carat)) # and you can keep doing this to add layers to the plot
Try writing this as a single set of additions (e.g. one line of R code, though you can put in linebreaks). This is the most common workflow for me.
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = carat))
You can also set the aesthetic separately for each geom, and make some great plots this way. Though this can get complicated. Try using ggplot to build a histogram of prices.
ggplot(diamonds, aes(x=price)) +
geom_histogram(aes(fill = clarity)) +
facet_grid(clarity ~ cut) +
theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.
First let’s set up a few preliminaries.
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.1.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
sem <- function(x) {sd(x) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}
First read in two data files and subject info. A and B refer to different trial order counterbalances.
subinfo <- read.csv("../data/sklar_expt6_subinfo_corrected.csv")
d.a <- read.csv("../data/sklar_expt6a_corrected.csv")
d.b <- read.csv("../data/sklar_expt6b_corrected.csv")
Gather these datasets into long form and get rid of the Xs in the headers.
d.a.tidy = d.a %>%
gather(subid, RT, starts_with("X"))
d.b.tidy = d.b %>%
gather(subid, RT, starts_with("X"))
#I remove the Xs below
Bind these together. Check out bind_rows.
d.ab.tidy = bind_rows(d.a.tidy, d.b.tidy)
## Warning: Unequal factor levels: coercing to character
#Below I remove the X's from the subid and re-code subid as a number
d.ab.tidy = d.ab.tidy %>%
mutate(subid = as.integer(gsub("X", "", subid)))
Merge these with subject info. You will need to look into merge and its relatives, left_join and right_join. Call this dataframe d, by convention.
#Merge data with subject info
d.full = right_join(d.ab.tidy, subinfo, by = "subid")
#I reorder the columns, so that all the subject info comes before the RT
d = select(d.full, subid, prime, prime.result, target, congruent, operand, distance, counterbalance, presentation.time, subjective.test, objective.test, RT)
Clean up the factor structure.
d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")
d$subid <- factor(d$subid)
d$counterbalance <- factor(d$counterbalance)
str(d)
## Classes 'tbl_df', 'tbl' and 'data.frame': 6468 obs. of 12 variables:
## $ subid : Factor w/ 42 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ prime : Factor w/ 154 levels "=1+2+5","=1+2+9",..: 1 3 5 8 10 11 12 14 15 17 ...
## $ prime.result : int 8 9 8 10 12 13 12 11 12 13 ...
## $ target : int 9 11 12 12 11 12 11 10 11 9 ...
## $ congruent : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ operand : Factor w/ 2 levels "addition","subtraction": 1 1 1 1 1 1 1 1 1 1 ...
## $ distance : int -1 -2 -4 -2 1 1 1 1 1 4 ...
## $ counterbalance : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ presentation.time: Factor w/ 2 levels "1700","2000": 1 1 1 1 1 1 1 1 1 1 ...
## $ subjective.test : int 0 0 0 0 0 0 0 0 0 0 ...
## $ objective.test : num 0.587 0.587 0.587 0.587 0.587 ...
## $ RT : int 597 699 700 628 768 595 664 803 767 700 ...
Examine the basic properties of the dataset. First, take a histogram.
ggplot(d, aes(x = RT)) +
geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Challenge question: what is the sample rate of the input device they are using to gather RTs?
#I did not have a chance to answer this challenge question.
Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks (this information is stored in subinfo). What do you see? Are they related to one another?
ggplot(subinfo, aes(x = objective.test)) +
geom_histogram() +
facet_grid(~subjective.test)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(subid, objective.test, color = subjective.test, facets = . ~ subjective.test, data = subinfo)
#From these plots above, it appears that subjects who report seeing the prime, on average perform higher on the objective test.
r = lm(objective.test ~ subjective.test, data = d)
summary(r)
##
## Call:
## lm(formula = objective.test ~ subjective.test, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3162 -0.1130 -0.0032 0.0802 0.3791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.54276 0.00261 208 <2e-16 ***
## subjective.test 0.21089 0.00370 57 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.149 on 6466 degrees of freedom
## Multiple R-squared: 0.335, Adjusted R-squared: 0.335
## F-statistic: 3.25e+03 on 1 and 6466 DF, p-value: <2e-16
#A linear regression testing whether performance on the subjective test predicts performance on the objective test reveals a significant relationship between these two manipulation checks: If subjects pass the subjective test, they are more likely to perform higher on the objective test (b =0.21, p < 0.001). In other words, as concluded from the plots above, subjects who report seeing the prime on average perform higher on the objective test than subjects who claim they did not see the prime.
OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.
ds = d %>%
filter(subjective.test == 0, objective.test < 0.6)
#View(ds)
#Below I am just checking that the filter did what I intended.
summarise(ds, max = max(subjective.test))
## Source: local data frame [1 x 1]
##
## max
## 1 0
summarise(ds, max = max(objective.test))
## Source: local data frame [1 x 1]
##
## max
## 1 0.5873
Sklar et al. show a plot of a “facilitation effect” - the time to respond to incongruent primes minus the time to respond to congruent primes. They then show plot this difference score for the subtraction condition and for the two presentation times they tested. Try to reproduce this analysis.
HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).
ds_summary = ds %>%
group_by(subid, presentation.time, operand, congruent) %>%
summarise(avg = mean(RT, na.rm = T)) %>%
spread(congruent, avg) %>%
mutate(diff = no - yes) %>%
group_by(operand, presentation.time) %>%
summarise(avg = mean(diff), err = sem(diff))
ds_summary
## Source: local data frame [4 x 4]
## Groups: operand
##
## operand presentation.time avg err
## 1 addition 1700 -13.929 8.902
## 2 addition 2000 4.026 5.022
## 3 subtraction 1700 20.999 6.275
## 4 subtraction 2000 9.976 4.447
Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).
library(ggplot2)
sklar_plot = ggplot(data = ds_summary, aes(x = presentation.time, y = avg, fill = presentation.time))
sklar_plot +
geom_bar(stat = "identity") + facet_wrap(~operand) +
geom_errorbar(width = .1, aes(ymin = avg - err, ymax = avg + err))
## Warning: Stacking not well defined when ymin != 0
What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data?
Response: The average facilitation effects for the subtraction primes for each of the presentation times appear to match the averages reported and plotted in the Sklar et al. paper. However, my error bars do not match. It appears that Sklar et al. plotted half of the standard error above and below the mean rather than plotting one full standard error above and below the mean.
Challenge problem: verify Sklar et al.’s claim about the relationship between RT and the objective manipulation check.
qplot(objective.test, RT, data = ds)
## Warning: Removed 87 rows containing missing values (geom_point).
cor.test(ds$objective.test, ds$RT, use = pairwise.complete.obs)
##
## Pearson's product-moment correlation
##
## data: ds$objective.test and ds$RT
## t = 2.035, df = 2529, p-value = 0.04193
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.001478 0.079274
## sample estimates:
## cor
## 0.04044
From the plot, there does not appear to be a relationship between RT and the objective manipulation. The correlation coefficient between these variables is also low (r = 0.04), suggesting there is no relationship between participants’ performance on the objective manipulation and their reaction times.
Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis that people can do arithmetic “non-consciously”?
#First, I would look at the data without excluding any participants. I would also use confidence intervals rather than standard error bars.
d_summary = d %>%
group_by(subid, presentation.time, operand, congruent) %>%
summarise(avg = mean(RT, na.rm = T)) %>%
spread(congruent, avg) %>%
mutate(diff = no - yes) %>%
group_by(operand, presentation.time) %>%
summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))
ggplot(data = d_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) +
geom_bar(stat="identity") +
facet_wrap(~operand) +
geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95)) +
theme_bw()
## Warning: Stacking not well defined when ymin != 0
#I am somewhat concerned with the authors' exclusion criteria because such a high number of subjects are excluded from the analysis. Below, I first only exclude subjects who passed the subjective test. Second, I only exclude subjects who scored higher than .6 on the objective test. I compare the plots of the data using these exclusion criteria.
ds_st = d %>%
filter(subjective.test == 0)
ds_st_summary = ds_st %>%
group_by(subid, presentation.time, operand, congruent) %>%
summarise(avg = mean(RT, na.rm = T)) %>%
spread(congruent, avg) %>%
mutate(diff = no - yes) %>%
group_by(operand, presentation.time) %>%
summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))
ggplot(data = ds_st_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) +
geom_bar(stat="identity") +
facet_wrap(~operand) +
geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95)) +
theme_bw()
## Warning: Stacking not well defined when ymin != 0
ds_ot = d %>%
filter(objective.test < 0.6)
ds_ot_summary = ds_ot %>%
group_by(subid, presentation.time, operand, congruent) %>%
summarise(avg = mean(RT, na.rm = T)) %>%
spread(congruent, avg) %>%
mutate(diff = no - yes) %>%
group_by(operand, presentation.time) %>%
summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))
ggplot(d = ds_ot_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) +
geom_bar(stat="identity") +
facet_wrap(~operand) +
geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95))
## Warning: Stacking not well defined when ymin != 0
Visualizing the data with different exclusion criteria, reveals that the results Sklar et al. found are largely driven by the specific exclusion criteria they use. If you use only one of the criteria they implement (i.e., only inluding subjects who fail the subjective test OR only including subjects who score less than 0.6 on the objective test), the facilitation effect disappears for the 2000ms presentation-time and decreases for the 1700ms presentation-time. Also, using confidence intervals instead of standar-error bars reveals that the effects Sklar et al. find are smaller than implied by the figure in the paper.
#I also made this plot. It is not easy to read and did not turn out to be that informative. I made it because I was curious to see if there were any differences among subjects in ow they responded to incongruent vs. congruent primes.
ggplot(ds, aes(x = RT, fill = congruent)) +
geom_histogram(position = position_dodge()) +
facet_wrap(operand ~ subid)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Challenge problem: Do you find any statistical support for Sklar et al.’s findings?
#I used Sklar et al.'s exclusion criteria in the analysis below.
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following object is masked from 'package:tidyr':
##
## expand
##
## Loading required package: Rcpp
r2 = ds %>%
filter(operand == 'subtraction') %>%
lmer(RT ~ congruent + (1 + congruent|subid), .)
summary(r2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ congruent + (1 + congruent | subid)
## Data: .
##
## REML criterion at convergence: 14621
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.333 -0.638 -0.090 0.588 5.069
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subid (Intercept) 12458.7 111.62
## congruentyes 13.4 3.66 -1.00
## Residual 9463.1 97.28
## Number of obs: 1214, groups: subid, 17
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 677.06 27.36 24.75
## congruentyes -15.06 5.65 -2.66
##
## Correlation of Fixed Effects:
## (Intr)
## congruentys -0.256
The study is a repeated measures design, so I fit the data with a linear mixed model with congruence as the predictor (fixed) variable and reaction time as the outcome variable. I also included a random intercept and slope for subject. The model reveals that the congruence of the prime does affect reaction time: participants who saw a congruent prime were significantly faster at responding than participants who saw an incongruent prime (b = -15.06, t = -2.663). The variance of subject (12458.74) is quite large compared to the residual variance (9463.11); thus the random effect of intercept for subjects is accounting for a substantial amount of the variance in our model. The variance of slope (13.42), however, is small compared to the residual variance, but this is likely because the correlation between the random effects of intercept and slope is quite high (r = -1.00). In sum, I do find statistical support for Sklar et al.’s findings.